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In many stochastic simulations of biochemical reaction networks, it is desir- 
able to "coarse-grain" the reaction set, removing fast reactions while retain- 
ing the correct system dynamics. Various coarse-graining methods have been 
proposed, but it remains unclear which methods are reliable and which re- 
actions can safely be eliminated. We address these issues for a model gene 
regulatory network that is particularly sensitive to dynamical fluctuations: 
a bistable genetic switch. We remove protein-DNA and/or protein-protein 
association-dissociation reactions from the reaction set, using various coarse- 
graining strategies. We determine the effects on the steady-state probability 
distribution function and on the rate of fluctuation-driven switch flipping tran- 
sitions. We find that protein-protein interactions may be safely eliminated from 
the reaction set, but protein-DNA interactions may not. We also find that it 
is important to use the chemical master equation rather than macroscopic rate 
equations to compute effective propensity functions for the coarse-grained re- 
actions. 
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I. INTRODUCTION 
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Biochemical reaction networks control how living cells function. Computer simulations provide 
a valuable tool for understanding how complex biochemical network architecture is connected to 
cellular function. A popular method for simulating biochemical networks is the "Stochastic Sim- 
ulation Algorithm" (SSA) which was introduced in this field by Gillespie^. For many reaction 
networks, however, SSA simulations are prohibitively expensive because of "time-scale separation": 
the reaction set contains some reactions which occur much more frequently than others. For every 
"slow" reaction event, many "fast" reaction events have to be simulated. This problem has led to the 
development of various methods for coarse-graining the reaction se t£i£iiij&L&2iI2 _ that is, eliminating 
the fast reactions and simulating only the slow reactions. Key issues are which fast reactions can 
safely be eliminated, and how this should be done, so as not to disturb the original dynamics. In this 
paper, we address these issues for a biochemical network which is especially sensitive to dynamical 
fluctuations: a bistable genetic switch. Because of its sensitivity, this model provides a useful test 
system for assessing how to coarse-grain biochemical networks. We expect our conclusions to be 
valid for a wide range of biochemical networks where fluctuation-driven processes are important. 

The SSA is a kinetic Monte Carlo method which generates trajectories for the numbers of molecules 
of each chemical species in the reacting system. The molecular discreteness of the reacting species 
is included, and the method includes stochastic fluctuations in the numbers of molecules, assuming 
that each reaction is a Poisson process. It is assumed that the system is well-stirred - i.e. the 
spatial distribution of the components is ignored (alternative methods that do include spatial effects 
have recently been develop ed^^^*) . The SSA generates trajectories that are consistent with the 
chemical master equation and it has been widely applied in the field of biochemical networks. A 
detailed explanation of the method can be found in Refs.-^. We note that some authors include a 
factor 2 in the definition of propensities for second order reactions. We choose not to use this factor 
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2. 

The SSA becomes inefficient when some of the reaction channels ("the fast reactions") have much 
higher propensities than others ( "slow reactions" ) - this is known as the time-scale separation prob- 
lem, and several methods for dealing with it have been proposed. In all cases, the first step is 
to identify which reactions are "fast" and which are "slow". The criterion is generally that fast 
reactions should reach a steady state faster than the waiting time between slow reaction events. 
The slow reactions are generally simulated using the SSA, while the various methods differ in their 
treatment of the fast reactions. In one class of methods, the fast reactions are propagated using 
the deterministic or chemical Langevin equation^^^, or via the r leap method^ 1 ^; these schemes 
require that the species in the fast reactions are present in large copy numbers. In another class 
of methods, which we consider here, the temporal evolution of the fast reactions is given by the 
chemical master equation. Because these algorithms take account of molecular discreteness, they do 
not require the species in the fast reactions to be present in large copy numbers^^*^ 1 ^ 1 ^. These 
schemes are discussed in more detail in section IIHI 

In this paper, we compare various coarse-graining strategies using a bistable genetic switch. 
Bistable genetic switches are excellent model systems for testing the accuracy of coarse-graining 
schemes. The reason is that the spontaneous flipping from one stable state to the other is driven 
by fluctuations — bistable switches are thus particularly sensitive to random fluctuations and hence 
to coarse-graining, which tends to remove fluctuations from the system. Moreover, these switches 
exhibit dynamics over a wide range of time scales, making them ideal for testing the assumption of 
time scale separation that underlies all coarse-graining schemes. The dimerization of proteins and 
the binding of dimers to the DNA occurs on relatively fast time scales of seconds to minutes, while 
the synthesis and degradation of proteins occurs on relatively slow time scales of tens of minutes or 
longer. Another important time scale is given by the duration of the switching trajectories, which 
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involve a combination of DNA binding, dimerisation and protein production and decay events. Ge- 
netic switches also show dynamics on an even longer time scale, which is the waiting time between 
switch flipping events. Since dimerisation and DNA binding occur on much faster time scales than 
the inverse switching rate, one might expect that these reactions could be integrated out without 
affecting the switching dynamics. On the other hand, previous work has shown that DNA binding 
fluctuations can actually drive the flipping of genetic switches^ 1 ^. It is thus unclear whether these 
reactions can safely be eliminated in simulations of genetic switches. 

We investigate the consequences of integrating out dimerisation and DNA binding for the switching 
dynamics of a genetic toggle switch that consists of two genes that mutually repress each other. A 
simple mathematical model of such a toggle switch has been studied by several groups^ 1 ^ 1 ^ 1 ^ 1 ^ 1 ^ 1 ^. 
Using the SSA in combination with the recently developed "Forward Flux Sampling" (FFS) method 
for rare event simulations^ 1 ^ 1 ^, we compute the steady-state distributions of all species and the 
rate at which the network undergoes spontaneous fluctuation-driven flips between its two stable 
states. We use these quantities as measures for the accuracy of various coarse-graining strategies for 
eliminating both DNA binding and dimerisation; the strategies differ in whether the deterministic 
macroscopic rate equations or the stochastic chemical master equation is used to compute averages 
over the fast reactions. 

Our results show that it is essential to use the chemical master equation approach rather than 
the macroscopic rate equation. This is perhaps not surprising, since the DNA and proteins are 
present in low copy numbers. In addition, we find that the steady-state distributions are quite 
robust to integrating out dimerisation and protein-DNA binding; in particular, the location and 
the shape of the peaks of the bimodal steady-state distributions are hardly affected by eliminating 
these fast reactions from the reaction set. In contrast, coarse-graining the dimerisation and DNA 
binding reactions can have a dramatic effect on the flipping rate. These results can be understood by 
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noting that the steady-state distribution is predominantly due to the dynamics within the basins of 
attraction corresponding to the stable states. Within these basins, the time scales of DNA binding 
and dimerisation are typically much faster than those of protein production and degradation; the 
time-scale separation requirement is thus satisfied most of the time. In contrast, the switching rate 
is determined by how the system leaves a basin of attraction. This relies on rare fluctuations of the 
fast reactions: in order for the switch to flip, the minority species has to dimerize and subsequently 
bind to the DNA target site. Indeed, as we discuss in more detail in a forthcoming paper—, during 
the flipping trajectories, the dynamics of DNA binding and dimerisation typically do not relax to 
a steady state. Thus, the reliability of a particular coarse-graining scheme strongly depends on 
the quantities one is interested in: equilibrium properties will typically show a low sensitivity to 
approximation procedures, whereas for fluctuation-driven properties, small errors will be amplified, 
leading to incorrect results. 

In the next section, we describe the model genetic switch. In section 1111} we give background 
information on the various coarse-graining schemes, and in section IIVt we discuss these in the 
context of the model switch. In Section |Vj we present results on the accuracy of the various coarse- 
graining procedures, using the stationary distribution and the switching rates as read-outs, and 
their computational speed-up. We end with a discussion on the implications of our findings for the 
simulations of complex biochemical networks. 

II. THE MODEL GENETIC SWITCH 

The model bistable genetic switch is shown schematically in Figure [1] and the set of reactions is 
listed in Table [p^^. 

As shown in Figured], two genes A and B are transcribed in divergent directions, under the control 
of a single DNA operator region, O, which contains one binding site. Coding region A encodes protein 
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FIG. 1: Pictorial representation of our model switch, corresponding to Table [I] Two divergently-transcribed genes A and B 
are under the control of a single regulatory binding site O. Proteins A and B both form homodimers, each of which can bind 
to O and block transcription of the other gene. 
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TABLE I: Reactions and propensity functions for the model genetic switch. 

A, while coding region B encodes protein B. Both proteins A and B are transcription factors, which, 
upon homo dimerisation, are able to bind to the operator sequence O. When A2 is bound at O, the 
transcription of B is blocked [A2 is a repressor for B]; while, conversely, when B2 is bound at O, the 
transcription of A is blocked [B 2 is a repressor for A]. When neither A 2 or B 2 is bound at O, both A 
and B are transcribed at the same average rate /c pro d- Protein monomers are also removed from the 
system with rate //, modelling active degradation processes as well as dilution due to cell growth. 
For convenience in this model system, we use the rather high value /x = 0.3/c pro d, corresponding 
to the case where removal from the cell is dominated by active degradation. In our model, we 
assume that all the steps leading to production of a protein molecule (transcription, translation and 
protein folding) can be modelled as a single Poisson process with rate constant fc pro d. The system is 
symmetric on exchanging A and B. 
For this model system, bistability has been demonstrated using a mean field analysis and with 
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simulations^. In one stable state, a large number of A proteins are present; this ensures that the 
operator O is mostly bound by A 2 , keeping B repressed. Conversely, in the other stable state, B 
proteins are abundant, so that O is mostly bound by B 2 , and A remains repressed. Previous work has 
demonstrated that, when simulated stochastically with appropriate parameters, the system makes 
occasional random flipping transitions between these two stable states^, as in Figure [2j In our 
simulations, we use the inverse of the production rate, &C rod as the unit of time; typical values for 
this rate are in the order of 10 -3 — 10 s -1 . We assume that the cell volume V remains constant. 
For simplicity, we use a value V — l, and define our rate constants in appropriate units. We choose 
a "baseline" set of parameters, in the region of parameter space where the system has previously 
been found to be bistable: k{ = 5k prod V, k h = 5k prod (so that = k d /kf = 1/V), k on = 5k prod , 
Kfi = k prod (so that = k of[ /k on = 1/(5V)), fi = 0.3k prod . This model system is loosely based on 
the bacteriophage lambda genetic switch^. For the phage lambda proteins cro and cl, assuming 
diffusion-limited protein-DNA association and using refs^ and^, k on ~ 5 — 10k prod . For cl and cro 
dimer formation, using refs 3 ^ and 3 ^, k{ ~ 50— 100A;p ro d. Protein degradation rates are much lower for 
phage lambda (of the order of /x ~ 0.1 — 0.01k pTod ) than for our model system. 

Throughout this paper, we represent the number of molecules of chemical species X which is 
present in the cell by nx- Later in the paper, we will need to characterize the switching process 
by an "order parameter", which we denote A. A natural choice is the difference between the total 
numbers of the two proteins in the cell: A = «a + 2riA 2 + 2noA 2 — ( n B + 2nB 2 + 2uob 2 )- Figure [2] 
shows A plotted as a function of time for a simulation of this reaction set using the SSA. Bistable 
behaviour is indeed observed: the system spends most of its time in one of the two stable states with 
occasional transitions between states. The average duration of a flipping transition event is much 
shorter than the average "waiting time" between the flipping transitions. 



in. DYNAMICAL COARSE-GRAINING: BACKGROUND 
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Dynamical coarse-graining schemes begin by splitting the reaction set into fast and slow reactions, 
as described in Section [H The slow reactions are generally simulated using the SSA. The fast 
reactions are approximated in ways that differ for different methods. In this paper, we only consider 
approaches in which the fast reactions are removed entirely from the reaction set, by assuming 
that they relax to a steady state faster than the waiting time between slow reactions. Effective 
propensities for the slow reactions are computed as averages over the steady state distribution, 
obtained from the chemical master equation for the fast reactions. 

The key step is the determination of the effective propensity functions a S j(n s ,nf) for the slow 
reactions in the coarse-grained reaction scheme. These depend on the copy numbers n s of the "slow 
species" (those that are only affected by the slow reactions), and n/ of the "fast species" (those that 




FIG. 2: Typical simulation trajectory for the model switch, with baseline parameters except for fi which is replaced by 
fi — 0.45fc pro d. The total numbers of A and B molecules fluctuate around two stable states, one rich in A and the other rich in 
B. Transitions between these states are rapid, yet infrequent. 
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are affected by both the fast and the slow reactions). The effective propensities are given by 

a*(n/,n s ) = y^P 00 (ri f \nf,n a )a'(n' f ,n a ), (1) 

where denotes the propensity function for a given slow reaction j and Poo(n'j-\nf,n s ) is the prob- 
ability of obtaining a given copy number n'j- for the fast species, at the end of a very long simulation 
of the fast reaction set only, starting from state space point (nj,n s ). These effective propensities 
are designed to give the same flux along the slow reaction channel, on average, as in the full system. 
In essence, it is assumed that in the full simulation, for any configuration n s of the slow species, 
the fast species reach a "quasi-steady state", with probability distribution function P(nf\n s ), on a 
timescale that is much faster than the waiting time before the next slow reaction. In practice, the 
effective propensity functions in Eq. ([I]) can be obtained by performing short SSA simulations of 
the fast reactions at fixed copy numbers of the slow species*^. Alternatively, one may solve the 
chemical master equation for the fast reactions analytically or numerically^^, which is the approach 
we employ here. 

It is important to discuss the definition of the "fast variables" (rif in Eq. (TjQ)) and "slow variables" 
(ris)^ 1 ^. In the work of Cao et a/.-, the slow variables are the copy numbers of those species which 
are unaffected by the fast reactions, while the fast variables can be changed by both fast and slow 
reactions. During the coarse-grained simulation, only the slow reactions are simulated, and one has 
the option of propagating in time either both the fast and the slow variables or just the slow variables. 
Bundschuh et a/.- describe a slightly different method for eliminating both the fast reactions and the 
fast variables from the simulation scheme. Here, as in the work of Cao et al, the fast variables are 
the copy numbers of all species which are affected by the fast reactions. The slow variables, however, 
are made up of the copy numbers of species which are unchanged by the fast reactions, as well as 
combinations of the fast variables. These combinations are chosen so that they are unchanged by 
the fast reactions. For example, for a fast reaction set 2A ^ A2, an appropriate slow variable would 
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be = «a + 2nA 2 . The original slow reaction set is then rewritten in terms of these new slow 
variables. This eliminates all the fast species from the slow reaction set and the simulation proceeds 
by simulating only the slow variables. This is the strategy which we adopt in this paper. 

iv. COARSE-GRAINING FOR THE MODEL GENETIC SWITCH 

Stochastic simulations of the model genetic switch have characteristic features that depend on the 
timescale over which we observe the simulation. In a timeframe of about 0.2k~^ od , we will observe 
mainly protein-protein association and dissociation events (typical timescale [tza(^a — and 
[uA-ikb]^ 1 respectively), as well as protein-operator association and dissociation (typical timescale 
[ n A 2 k n]~ 1 and [/Cofr]' 1 )- If we extend our observation "window" to a timeframe of about 4fc~ od , we 
observe also protein production and degradation (typical timescale [fcprod] -1 ) and ~3/Cp rod ). In a 
much longer timeframe, we observe flipping events between the two stable states. It is these flipping 
events that are the phenomenon of interest - yet for each "interesting" switch flipping event, very 
many "less interesting" association and dissociation events need to be simulated. This is an example 
of timescale separation, which we seek to overcome by coarse-graining - eliminating the protein- 
protein and/or protein-DNA association and dissociation reactions from the simulation scheme. 

To coarse-grain the model genetic switch, we divide the full reaction set (Tabled]) into fast and slow 
reactions. We will consider three cases: (i) protein-protein association and dissociation reactions (a 
in Table [I]) are fast, (ii): protein-DNA association and dissociation reactions (b in Table [I]) are 
fast, and (iii): both reactions (a) and (b) are fast. For each of these cases, we define fast and 
slow variables. The fast variables are the copy numbers nf of species which are affected by the 
fast reactions. The slow variables n s are either the copy numbers of species unaffected by the fast 
reactions, or linear combinations of fast variables which are unchanged by any of the fast reactions 
- e.g. n- K = riA + 2n^ 2 . To accompany these slow variables, we define "slow species". These may 
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represent either single chemical species, or combinations of species. For example, the species A 
represents an A molecule which is in either monomer or dimer form. We then rewrite the slow 
reaction set in terms of the new slow species, and carry out a simulation using Gillespie's SSA, with 
effective propensities given by Eq. (Q. For this system, only the first moment of P oc (nj|nj, n s ) is 
required. This can be obtained by solving the chemical master equation for the fast reactions at fixed 
value of the slow variables. Alternatively, one may approximate to the macroscopic rate equations 
for the fast reactions. A summary of the various coarse-graining methods used in the paper is given 
in Table [Til Table [TTJ also lists the coarse-grained reaction sets, and gives formulae for the effective 
propensity functions. The notation (X)gi^jw variables denotes the first moment (average) of the 
steady state probability distribution function P 00 (nj|n/, n s ), the superscript denoting whether the 
master equation or macroscopic rate equation is used to find the average, and the subscript indicating 
which slow variables the average depends upon. 

A. Coarse-graining protein-DNA binding 

We first remove the protein-DNA association and dissociation reactions 

O + A 2 # OA 2 O + B 2 v± OB 2 (2) 

from the original reaction scheme (Table UJ). We denote this approach "Eliminating Operator state 
fluctuations (EG*)". In our coarse-grained simulation, the system will still experience fluctuations 
due to protein-protein association and dissociation, protein production and protein decay, but not 
those due to the binding and unbinding of molecules to the DNA. 

The "fast species", which are affected by reactions (j2j), are A 2 , B 2 , OA 2 , OB 2 and O. The "slow 
species" are A, B, A 2 and B 2 , where A and B are simply the protein monomers - these are unchanged 
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Name 


Reaction 


Propensity 


Method 


Coarse-grained 
variable 


Definition 


EO 


0^ A 
0^B 
A + A # A 2 
B + B ^ B 2 
A^0 
B^0 


feprod {no + no As)!^ 
fcprod (no +"OB,)™g 2 
fco n nA(n A ~l), fc off<nA 2 )™g 9 
fconn B (riB-l), fc ff(n B2 )™g 2 

/J71 A 

/in B 


Rate Equation 


A 2 
B 2 


n A 2 = nA 2 + n OA 2 

n^ 2 = riB 2 + n B 2 


EDI 


+ 2A ^ OA 2 
O + 2B ^ OB 2 
O^O + k 
O^O + B 
OA 2 -» OA 2 + A 
OB 2 -> OB 2 + B 
A^0 
B^0 


L{iA 2 )f E , k ffnoA 2 

fcon{nB 2 )|' E , fc ffriOB 2 

/CprodTlO 

^prod^O 
^prod^OA 2 
fcprodn-OB 2 

»<n B )t E 


Rate Equation 


A 
B 


«A = n A + 2n A2 

n B = n B + 2n B2 


ED 2 


O + 2A ^ OA 2 
O + 2B ^ OB 2 
O^O + k 
O^O + B 
OA 2 -» OA 2 + A 
OB 2 -> OB 2 + B 
A^0 
B ^0 


fc on (nA 2 )^ B , k oS n A 2 

fcon{nB 2 )g , fcoffnoB 2 

^prod^O 

^prod^O 
&prod7!OA 2 
fcprod«-OB 2 

/*<»A>r 


Master Equation 


A 
B 


«A = n A + 2n A2 

n B = n,B + 2n B2 


EO-ED1 


-> A 
-> B 
A^0 
B ^0 


; / \ RE 

k P Tod\no + noA 2 )^ B 
k P rod(no + iOB 2 )^ E g 

/ \RE 
M\ n B/A.B 


Rate Equation 


A 
B 


n A = nA + 2wa 2 + 2noA 2 
rig = 71b + 2tib 2 + 2wob 2 


EO-ED2 


— > A 
-> B 
A^0 
B^0 


; / \ ME 
kprod{no + n OA 2 )^g 

k P rod(no + noB 2 ) 1 x% 
u<n A )f| 


Master Equation 


A 
B 


n A = nA + 2wa 2 + 2noA 2 
Tig = 71b + 2tib 2 + 2tiob 2 



TABLE II: Summary of coarse-graining schemes for the original reaction set (Table [I]): eliminating operator binding (EO), 
eliminating dimerisation reactions using the Macroscopic Rate Equation (EDI) or the Master Equation (ED2), eliminating 
both dimerisation and operator binding using the Macroscopic Rate Equation (EO-ED1) or the Master Equation (EO-ED2). 
For each coarse-graining scheme, the coarse-grained reaction set is indicated together with the propensity function for each 
reaction. We also give definitions of the new slow variables for each scheme. 

by the fast reactions (j2J) - and A 2 and B 2 are new species, such that: 



™a 2 = n A 2 + n A 2 (3) 
n B 2 = n B 2 + n Q B 2 
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and n Ba are simply the total numbers of dimers in the system - including both free and DNA- 
bound dimers. The new, coarse-grained, reaction set is given in Table [TTJ together with the effective 
propensities. As the operator O has been removed from the scheme, protein production has become 
a simple birth process, in which a monomer appears from "nowhere" . The propensity for this birth 
of a monomer (say A) takes into account the "lost" reactions - it reflects the probability, in the full 
reaction scheme, of finding the promoter O in one of the states O and OA 2 that are able to produce 
A. The protein-protein interactions ((a) in Tabled]) have been changed to reflect the fact that free 
dimers A2 have been replaced by the new species A2. Two monomers can now reversibly associate to 
generate a molecule of A 2 , while the reaction representing dissociation of free dimers to monomers 
has an effective propensity that depends on the average number of free dimers that would be present 
in the full reaction scheme, for a given value of total dimers. Protein degradation ((e) in Table 
|T]) remains unchanged since these reactions affect only monomers. 

To evaluate the effective propensities listed in Table HTl we require the averages («-o+^oa 2 )a 2 b ' 
(no + Wob 2 ) a 2 B 2 >( n A 2 )A 2 b 2 an d ( n B 2 )A 2 B2- These depend on both slow species A 2 and B 2 , because 
the two operator binding reactions are coupled. This arises from the competition between A 2 and 
B 2 for the same binding site. In this particular shown in Appendix El solving the master 

equation for the fast reactions ([2]) and approximating them by the corresponding macroscopic rate 
equations give the same result, so we consider only the rate equation approach. Solving for the 
steady state of Eqs. (T5J), we obtain 

K(noA 2 )t A = (no)Zv 2 ■ Mfl^ (4) 
^D>OB 2 >?f, B2 = MZb 2 ■ K 2 )£b 2 . 
Combining this with the fact that in our scheme there is only one DNA copy: 

n Q + n Q A 2 + n Q B 2 = 1 (5) 
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we obtain 



o-A,eS — k piod (n + noh 2 )\ 




(6) 



) 



and similarly for aeeff- To find (n\ 2 )^ E ^ and (?t.b 2 )?' E a in Eq. (ISj), we combine relations (BJ and 



(EJ) with 



A 2 ~ 



riA 2 + n Q A 2 



(7) 



™B 2 = n B 2 +^OB 2 - 

Numerical solution techniques are required here, and we have used the Newton-Raphson method 36 . 



from our original reaction scheme (Table [I]). We denote this approach "Eliminating dimerisation 
(ED)". These interactions are particularly attractive candidates for coarse-graining, since they tend 
to consume a large fraction of the computational effort when there are significant numbers of free 
monomers and dimers in the system. 

The "fast" species - those whose number is affected by reactions ((H]) - are A, A 2 , B and B 2 . The 
"slow species", which will remain in our coarse-grained reaction scheme, are O, OA 2 , OB 2 - species 
from the original scheme which are not affected by reactions ([8]) - together with two new species, A 



B. Coarse-graining protein-protein binding 



We now remove instead the protein-protein association and dissociation reactions 



A + A # A 2 




B + B # B 2 



15 

and B, denned by: 

ra A = ra A + 2ra Aa (9) 
ng = n B + 2n Ba 

These new species A and B are combinations of the fast species whose number remains unchanged 
by the fast reactions ((a) in Tabled]). The new, coarse-grained, reaction set with the corresponding 
propensity functions, is given in Table HT1 The protein production reactions ((c) and (d) in Table [I]) 
now produce the new species A and B. In the original reaction set (Table [I]), the protein degradation 
reactions (e) affected only monomers. The corresponding reaction in the new reaction set removes a 
molecule of the new species A and B from the system, but with an effective propensity that depends 
on the average number of monomers that would be obtained by a simulation of the fast reactions, 
at fixed n k or n B . Similarly, reactions (b) in the original set, corresponding to the association and 
dissociation of dimers with the DNA, have been replaced by the association/dissociation of two units 
of A or B to O, with an effective propensity proportional to the average number of dimers given 
by the fast reaction set for fixed n A or ra B . Here, the averages required for the effective propensity 
functions depend on only one slow variable - either A or B but not both - in contrast to method 
EO, where the averages depend on both slow variables. This is because the two reactions flSJ) are 
not coupled to each other: dimerisation of A has no direct effect on the dimerisation propensity of 
B and vice versa. 

We shall test two alternative approaches to the computation of the averages (tia)a) ( n A 2 )xi ( n B)e 
and (ra.B 2 ) B i n Table [TTJ In the first approach, which we denote EDI, we make the approxima- 
tion that these averages correspond to the steady state solutions of the macroscopic rate equations 
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corresponding to (jSJ): 



h(n A2 ) k - kf(n A )\ = 



(10) 



h(n B2 ) B ~ h(n B )\ = 



so that 



(11) 




B _ 




Relations (TTTT) can be used together with the definitions ([2]) to give 



8n k /K* + 1-1/4 



(12) 



8n B /K* + 1-1/4. 



The average numbers of dimers (wa 2 )a and (^b 2 )b are given in this approximation by combining 
(I12p with dHJ). Method EDI is expected to give incorrect results when n A or n B is small, since the 
macroscopic rate equation approximation breaks down in this limit. This is expected to be a serious 
problem, because for our genetic switch model, both and n B will be small at the crucial moments 
when the switch is in the process of flipping between the two steady states^. Alternatively, one may 
solve the master equation corresponding to the eliminated reactions (jSJ) to compute the averages. 
We denote this approach ED2. Numerical solution of this master equation, as described in Appendix 
|B| results in the probability distribution functions p(n^\n k ), p(nA 2 \n k ), p(nB\n B ) and p{n B2 \n B ) for 
the fast variables, for given values of A and B. These can be used to find (tia)^, ( n A 2 )\ m , ( n B)g E 
and (^b 2 )b E - These averages are then substituted into the expressions given in Table HT1 to obtain 
effective propensities for the coarse-grained SSA simulation. We note that the effective propensity 
functions for methods EDI and ED2 in Table [III are identical. The only difference between the two 
schemes is the way in which the required averages are obtained: using a macroscopic rate equation 
approximation (EDI) or by numerical solution of the master equation (ED2). 
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c. Coarse-graining protein-DNA and protein-protein binding 

We now eliminate both protein-DNA interactions [Eq.([2])] and protein-protein interactions [Eq.flEJ)]. 
We will be left with a coarse-grained scheme in which the only fluctuations are due to protein 
production and degradation. Our "fast reactions" are then d2J) and (jHD, and our "fast species", 
whose number is changed by the fast reactions, are in fact all the species in the original scheme: O, 
OA 2 , OB 2 , A, A 2 , B and B 2 . Our only slow species, A and B, are then combinations of the fast 
species whose number is unchanged in any of the fast reactions: 

n k =n A + 2n M + 2n A 2 (13) 
™ B = n B + 2n B2 + 2n OB2 . 

and ng correspond to the total number of A and B molecules in the system. On removal of 
reactions (J2J) and (jSJ), our coarse-grained reaction scheme, given in Table HTT1 under the labels EO- 
ED1 and EO-ED2, consists simply of a pair of birth-death processes for species A and B. The effects 
of the "lost" fast reactions are incorporated via effective rate constants that account for the average 
number of the relevant fast species expected in a simulation of the fast reaction set, for fixed numbers 
of the slow species. As in the EO coarse-graining scheme, but not in the ED schemes, the averages 
here depend upon both slow species, since the fast reactions for A and B are coupled by the shared 
DNA binding sites. 

As for the ED schemes, we consider two alternative ways of obtaining the necessary averages 

(no) a,b> ( n A)A,B> (™a 2 )a,b> (™oa 2 )a,b> (^b)a,b> (™b 2 )a,b and (™ob 2 )a,b- Firstly, in approach EO- 
ED1, we approximate these averages by the steady state solutions of the macroscopic rate equations 
corresponding to the fast reactions (j2J) and jSJ). Following the same steps as in the previous two 
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Reaction Propensity Reaction Propensity 

A + A^A 2 fc f f!A(nA- 1), k b nA 2 B + B^B 2 fc f n B (n B -l), k b n B2 

O + A 2 ^ OA2 k on nonA 2 , k gnoA 2 O + B 2 ^ OB 2 k on noriB 2 , k a finoB 2 

TABLE III: Reaction scheme for the preliminary simulations to compute the effective propensity functions given in Eqs. (Q2 

and (TT6l), for scheme EO-ED2. 



sections (applying equations (jlj) and (jTUl) ). we arrive at 



K>A E B = <™A>! E b + (Mfl) 2 + (14) 



1 + (K*K^ [((n A )l|) 2 + ((n B )J|) 
K)a| = ^B)J E B + 2(^)- 1 ((n B )f|) 2 + 



l + (#gJt*)-* [«n A > JJ) 2 + ((n B )J|) 2 _ ' 
which can be combined with relations (|13l) and inverted numerically^ to give (^a)^ and (^b)^- 

The other averages required in Table HT1 are obtained using relations (j3J, ([5]) and (TTTT) . Approach E0- 

ED1 is approximate, since it assumes that the average numbers of molecules that would be produced 

by long stochastic simulations of the fast reaction set are given by the steady-state solutions of the 

corresponding macroscopic rate equations. This approximation is avoided in approach EO-ED2, in 

which we calculate the averages of the fast variables A, A 2 , OA 2 , B, B 2 , OB 2 and O from the master 

equation corresponding to the coupled reactions ([2]) and ([H]). This is difficult to do directly (as in 

scheme ED2), so we use simulations. We carry out a series of short preliminary simulations, using 

the SSA, of reactions ([2]) and ([8]), for fixed values of and n^. The reaction scheme for these 

preliminary simulations is given in Table IHIl 

From these, we compute the averages required for the effective propensities 

OA.eff = ^prod f(™0 + n OA 2 )f^) (15) 

and 

/iA, ef r = vMfl- (16) 
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These propensities are stored in a lookup table which is referred to during the coarse-grained simu- 
lations of the slow variables. 

v. RESULTS 

We now assess the performance of the various coarse-graining approaches, in terms of how accu- 
rately they reproduce the behaviour of the full system, and how much they speed up the simulations. 
Key features of the behaviour of this system are its bimodal steady state probability distribution 
function and its spontaneous flips between the two stable states. We assess how well the coarse- 
grained simulations reproduce the bimodal probability distribution for the difference A in total 
number between the A and B proteins: 

A = n A + 2n M + 2n A 2 ~ ( n B + 2n B2 + 2n B 2 ) (17) 

We also test how well the various schemes reproduce the rate of fluctuation-induced switching be- 
tween the A and B-rich states, which we measure using the Forward Flux Sampling (FFS) rare event 
simulation method^ 1 ^ 1 ^. We compare our results to SSA simulations of the full reaction set (Table 
[TJ which we denote ORN ("Original Reaction Network"). The coarse-graining schemes are based on 
the assumption that the "fast" reactions are indeed fast compared to the slow reactions. We there- 
fore expect their accuracy to improve as the rate constants for the "fast" reactions increase. We will 
see that the accuracy also depends on which reactions we choose to be "fast" and how we compute 
the averages needed for the propensity functions. We end with a discussion on the computational 
performances of the different methods. 

A. Steady-state probability distribution 

We now compute the steady-state probability distribution function p(A) for the difference A be- 
tween the total numbers of A and B proteins, as given by Eq. ffTTj) . We expect p(A) to have two 
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peaks around the known stable steady states A = ±27 (see ref.—), and a "valley" around the un- 
stable steady state A = 0. To compute p(A), we carry out a long SSA simulation, during which we 
compile a histogram for the probability of finding the system at each A value. This procedure is 
repeated for all the coarse-grained simulation schemes in Table [Til However, it is hard to achieve 
good sampling of p(A) in the "valley" region close to A = 0, where the system is very unlikely to be 
found. In this region we use the FFS method to compute p(A) more accurately^. This method will 
be described briefly in the following section and in Appendix O Its use for computing steady state 
probability distributions is described in Ref.—. Results are shown in Figure El for the "baseline" 
parameter set given in section [TTJ As expected, p(A) is clearly bimodal and shows symmetric peaks 
flanking a "valley" at A = 0. The locations of the peaks and valley correspond to the stable and 
unstable solutions of a mean field analysis^ of the switch. Comparing the results for the different 
coarse-graining schemes in Figure El we see that they all appear to reproduce p(A) quite well, giving 
the correct position, height and width of the peaks. Inset A magnifies the left probability peak, 
showing that the only methods displaying a small systematic error are EDI and EO-ED1, i.e. the 
coarse-graining schemes relying on the solutions of the Macroscopic Rate Equation. In general, we 
can conclude that all the methods reproduce p(A) rather well in the peak regions. However, when 
we investigate in more detail the results for the "valley" region around A = 0, clear differences are 
observed between the coarse-graining methods. Inset B of Figure [3] shows on a logarithmic scale the 
results for p(A) in this region, generated using the FFS method. All the coarse-graining methods 
deviate from the results of the full reaction network (ORN). The apparent effect of removing dimeri- 
sation (ED1/ED2) is to shift the minimum up in probability, with the macroscopic rate equation 
approach (EDI) having a stronger effect. Removing operator state fluctuations (EO) has the op- 
posite effect, shifting the minimum down in probability. Methods EO-ED1 and EO-ED2 appear to 
show a combination of these two effects. Although these deviations from the ORN results are small, 
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FIG. 3: Probability distribution p(A) as a function of the order parameter A. Inset A zooms in on the left peaks and shows how 
all the methods are able to reproduce the positions and heights of the peaks. Inset B shows, on a logarithmic scale, deviations 
between the different coarse-graining schemes and the original network for the region around A = 0. FFS was used to sample 
p(A) in this region. 

they will turn out to be rather important for the dynamical switching behaviour to be discussed in 
the next section. However, if one is only interested in the steady state distribution, the choice of the 
particular coarse-graining method does not appear to be crucial. 

B. Rate of stochastic switch flipping 

In many cases, fluctuation-driven dynamical properties are an important output of a simulation 
of a biochemical network. This is especially true of genetic switches, where a key characteristic 
is the rate of flipping between stable states (as shown in Figure [2] for the model genetic switch). 
When simulating these systems, one requires not only an accurate representation of the steady-state 
distribution, but also of the dynamical behaviour of the system. We now test whether the various 
coarse-graining methods are able to reproduce the correct rate of stochastic flipping of the model 
switch. This is a particularly stringent test, since this fluctuation-driven process is likely to be highly 
sensitive to the accuracy with which dynamical fluctuations are reproduced in the different schemes. 
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FIG. 4: Switch flipping rate /cab as a function of the dimer-DNA association rate fc on , adjusting k g so that the equilibrium 
constant for DNA binding remains unchanged. The switch is more stable when operator binding/unbinding is rapid, suggesting 
that fluctuations in these reactions play an important role in the switch nipping. Methods that remove these reactions give 
a straight line in the figure; among those, only methods EO and EO-ED2 are able to capture the asymptotic behavior of the 
curve for k on — > oo. Method ED2 always gives a good approximation of the rate, while EDI consistently overestimates it by 
about one order of magnitude. 



To measure the rate /cab of stochastic switch flipping, we use the FFS method 20 ' 28 , which allows 
the calculation of rate constants and the sampling of transition paths for rare events in stochastic 
dynamical systems. Because rare events (such as switch flipping) happen infrequently, standard 
simulations have difficulty in achieving good statistical sampling. In FFS, simulation pathways 
corresponding to the rare events are generated efficiently via a series of interfaces, defined by a 
parameter A, separating the initial and final states. At the same time, the rate constant is calculated 
from the probabilities of reaching adjacent interfaces. FFS is described in more detail in Appendix 
ICl and in refs 2 ^ 2 ^. The method can also be used to obtain the steady state probability distribution 
as a function of the A parameter, as in inset B of Figured] 5 '. 

Figure H] shows the switch flipping rate /cab, as a function of the dimer-DNA association rate k on . 
The dimer-DNA dissociation rate is adjusted to keep K£ = 1. The other parameters are fixed at 
fcf = 5/Cp ro d, n = 0.3/cp ro d and K^ = l/5. For the full reaction network (ORN; solid line), the flipping 
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rate decreases as DNA binding becomes faster, reaching a plateau for very fast (/^on ^ 500/uprod) 
operator association/dissociation. The switch is more stable when operator binding/unbinding is 
rapid, suggesting that fluctuations in these reactions play an important role in switch flipping. For 
the EO method, in which protein-DNA association/dissociation reactions are "lost", the flipping rate 
does not depend on k on (since only the equilibrium constant Kjy features in this method and this is 
kept constant). As expected, the flipping rate for the EO method corresponds to the ORN result 
in the limit of large k on . When we coarse-grain over protein-protein interactions (EDI and ED2), 
our results are very dependent on whether the macroscopic rate equations or the master equation is 
used to compute propensity functions. For the rate equation approach (EDI), the decrease in &ab 
with k on is reproduced, but the switch is almost an order of magnitude less stable than for the full 
reaction set. However, when the chemical master equation is used to compute the propensities, the 
results are remarkably accurate - the ED2 approach gives switch flipping rates in good agreement 
with the full reaction set. The two methods EO-ED1 and EO-ED2, which coarse-grain over both 
DNA binding and dimerisation reactions, also show this behaviour: for EO-ED1, where the rate 
equation approximation is used, the switch flipping rate is similarly almost an order of magnitude 
too high, whereas for EO-ED2, where the master equation is used, &ab is indistinguishable from 
that given by method EO (where dimerisation is simulated explicitly). These results show that 
dimer-DNA binding plays an important role in switch flipping for association/dissociation rates 
in the physiological range, and that, for reliable coarse-graining, effective propensities need to be 
computed with the master equation rather than the macroscopic rate equation approximation. 

Figure shows the equivalent result when the monomer-monomer association rate fcf is varied, 
adjusting /c b so that Kf ) = l. The other parameters are k on = 5k prod , n = 0.3k pTod and 1^ = 1/5. For 
the full reaction set (ORN), &ab increases with kf. as the dimerisation reactions become faster, the 
switch becomes less stable. This is in contrast to the behaviour observed in Figure HI It appears 
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FIG. 5: Switch flipping rate &ab as a function of the protein-protein association rate kf, adjusting fcb so that the equilibrium 
constant for dimerisation remains unchanged. For the full reaction set, the switching rate increases with the rate of dimerisation. 
The EO curve consistently underestimates the rate by approximately an order of magnitude. The methods that remove the 
dimerisation reactions give a constant horizontal line. Among them, only methods EO-ED1 and EO give good results for high 
k{ , although for the latter we suspect this arises from a lucky cancellation of errors. 



that switch flipping is hindered by fluctuations in the monomer-dimer reactions. This apparently 
somewhat counter-intuitive result can perhaps be explained as follows: protein is produced in the 
monomer form. To flip the switch, it needs to dimerise and bind to the operator. If dimerisation 
is slow, the monomer may be degraded before it has a chance to dimerise, and in this case it 
does not contribute to flipping the switch. On the other hand, if dimerisation is fast, then every 
monomer that is produced makes a contribution to the dimer pool and can potentially bind to the 
operator, leading to switch flipping. The EO approach (eliminating dimer-DNA binding) shows the 
same increase in &ab with kf, but underestimates the value of &ab by about an order of magnitude. 
This supports our view that fluctuations in operator binding are important for switch flipping. 
On eliminating dimerisation fluctuations (EDI and ED2), we observe the same problem with the 
macroscopic rate equation approximation as in Figure H]- EDI produces a flipping rate that is too 
high, while ED2, where the master equation is used, gives good agreement with the full reaction set 
(ORN) in the high kf limit. When both protein-protein and protein-DNA association/dissociation 
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reactions are eliminated, method EO-ED2 gives results in agreement with EO in the high kf limit. 
Method EO-ED1 gives unexpectedly good results, in fairly close agreement with ED2 and ORN. 
However, given that we expect removing DNA binding to reduce the rate constant, while using the 
macroscopic rate equation approximation increases it, this is likely to be just a lucky cancellaton 
of errors for this particular parameter set. Figure [5] therefore demonstrates that fluctuations in the 
monomer-monomer association/dissocation reactions actually disfavour switch flipping. Moreover, 
as for Figure HJ we see that the macroscopic rate equation approximation is not reliable for predicting 
switch flipping rates, while coarse-graining over the dimerisation reactions using the master equation 
approach (ED2) becomes reliable when kf > 5/c pro d (for this parameter set). 

We have also tested the various coarse-graining approaches for the case where both protein-protein 
and protein-DNA association/dissociation reactions are fast (kf = 100fc pro d, k on = 100A;p ro d). In this 
case, as expected, methods EO, ED2 and EO-ED2 all give flipping rates in agreement with the full 
reaction set (ORN), while EDI and EO-ED1 do not. This indicates once again that in general the 
macroscopic rate equation approach is not suitable for computing switching rates. 

c. Computational Performance 

We measure the computational speed-up gained by the various coarse-graining approximations. 
Of course, this depends on the values that we choose for the rate constants. Table [TV] shows the 
CPU time, in seconds (on an AMD Athlon 1600+ processor), required for a simulation run of 
length 10 5 k~^ od , for various values of the rate constants k on for protein-DNA binding and kf for 
dimerisation. In all cases, fc g and k^ are adjusted so as to keep the equilibrium constants and 
fixed. Considering the full reaction network (ORN) - top row in the table - we observe that the 
CPU time is much more sensitive to the dimerisation rate than to the protein-DNA binding rate. 
This confirms that the SSA mostly executes monomer-monomer association and dimer dissociation 
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reactions, even when k on is much greater than k{. This is because the propensity for dimerisation 
depends on (roughly) the square of the number of free monomers, which is generally quite large. 
Protein-protein association/dissociation is therefore the performance bottleneck for this system. 
Bearing this in mind, it is not surprising that when we consider the next row in Table IIVI we see 
that removing protein-DNA association and dissociation reactions (EO), is only useful when the rate 
constants for these reactions are exceedingly large. Eliminating the protein-protein association and 
dissociation reactions (EDI and ED2) results in a dramatic speed-up compared to the ORN case 
(rows 3 and 4). This speed-up is most impressive when the dimerisation rate is high. There is no 
significant difference in CPU time between the EDI and ED2 methods, as solving the dimerisation 
master equation can be done analytically and takes only a negligible time. When we eliminate 
both protein-DNA and protein-protein association and dissociation (bottom two rows in Table [IVj) . 
we obtain a further factor 2-25 increase in speed. Again, there is no significant difference in CPU 
time between methods EO-ED1 and EO-ED2. In this case, the solution of the master equation in 
method EO-ED2 is done numerically, and it can take up to a few hours. However this procedure is 
performed in a separate simulation and the results are stored in a lookup table, to be used for all 
the simulations of the switch. Therefore, we do not include the time needed to generating this table 
in Table HVl We conclude that, in the physiological parameter range, some computational speed-up 
can be obtained by removing protein-DNA binding reactions; however, much more computer time 
can be saved by coarse-graining protein-protein association and dissociation reactions. 

vi. DISCUSSION 

Understanding cellular control systems will require the study of very complex biochemical reaction 
networks. Computer simulations clearly have an important contribution to make in this area, since 
they can provide quantitative understanding of how biochemical networks work. It is clear that 
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in many cases (including gene regulation), stochastic simulations are required. However, the more 
complex the biochemical network is, the more computationally expensive it is to simulate. Eliminat- 
ing fast reactions will be essential for simulating biochemical networks of the scale and complexity 
that is relevant for biological cells. It is therefore very important to understand how this can be 
done reliably, while preserving the correct dynamical features of the full reaction network. In this 
paper, we have made a systematic study of the computational speedup and accuracy of a range of 
coarse-graining schemes, for a model gene regulatory network. All gene regulatory networks involve 
protein-protein and protein-DNA interactions. These tend to be rapid in comparison to protein 
production (transcription, translation and folding) and removal from the cell (active degradation 
and dilution due to growth and division). We try to address the general question of what the con- 
sequences are of eliminating protein-protein or protein-DNA association and dissociation reactions 
from stochastic simulations of gene regulatory networks. We use as our case study a bistable ge- 
netic switch, since this gives us a very sensitive readout, in the form of the switch flipping rate, 
of the accuracy with which dynamical fluctuations are reproduced by the various coarse-graining 
schemes. We hope that our results will prove relevant to simulations of real genetic switches and 
gene regulatory networks in general. 
To coarse-grain the reaction scheme for the model genetic switch, within the context of Gillespie's 
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TABLE IV: CPU time (in seconds) required to simulate the system for i s im = 10 5 fcp r o d , for different parameter sets. Simulations 
were performed on an AMD Athlon 1600+ processor. The dissociation rates were scaled such that the equilibrium constants 
for dimerisation and operator binding were kept constant at Kf, = 1/5 and — 1. 
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Stochastic Simulation Algorithm (SSA), we have used the approach described by Bundschuh et al.—. 
Here, the reaction set is divided into "fast" and "slow" reactions. Chemical species whose number is 
changed by the fast reactions are designated "fast" . A set of "slow" chemical species is constructed, 
which consists of original species that were unaffected by the fast reactions, together with new species, 
formed from linear combinations of the fast species, such that their number is unaffected by the fast 
reactions. The slow reactions are then rewritten in terms of the set of slow species, with effective 
propensity functions that depend on averages (and in some cases variances) of the fast reaction set, 
for fixed numbers of molecules of the slow species. These averages may be obtained by explicit or 
numerical solution of the chemical master equation for the fast reactions. Alternatively, one may 
make the approximation that the averages are well represented by the steady-state solutions of the 
corresponding macroscopic rate equations for the fast reactions. In either case, having computed the 
effective propensity functions, one simply implements the SSA for the slow reaction set, propagating 
the set of slow variables, using these effective propensities. 

For the model genetic switch, we investigated the effects of eliminating protein-protein associ- 
ation/dissociation reactions, and/or protein-DNA association/dissociation reactions, from the full 
reaction set. We also compared the macroscopic rate equation approximation to the master equa- 
tion approach for computing the effective propensities. Using all the coarse-graining schemes, we 
computed the steady-state probability distribution as well as spontaneous switch flipping rates. We 
found that all the coarse-graining methods gave good agreement with the full reaction network for 
the steady-state probability distribution, although small deviations were observed around the unsta- 
ble steady state. However, dramatic differences were observed in the switch flipping rates computed 
using the different coarse-graining schemes. Elimination of protein-DNA association/dissociation in- 
creased the stability of the switch (but agreed with the full reaction set in the fast reaction limit). In 
contrast, elimination of protein-protein association/dissociation decreased the stability of the switch 
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(again, agreeing with the full reaction set in the fast reaction limit). However, over most of the range 
of parameters tested, protein-protein association/dissociation reactions can be eliminated with min- 
imal effect on switching rates, and with the advantage of an impressive computational speed-up. 
This result is likely to prove very useful when simulating complex and computationally expensive 
networks. The implications of these observations for the physics of the switching mechanism for this 
model switch will be investigated in a future publication 30 . 

We also observed that the macroscopic rate equation approximation does not produce reliable 
switching rates, even though the steady-state probability distribution is reasonably well reproduced. 
Typically, switching rates computed using the macroscopic rate equation approximation are an order 
of magnitude too high, even in the limit of fast reactions. In contrast, when the chemical master 
equation is used to compute the effective propensities, results are in excellent agreement with the 
full reaction set for fast reaction rates. These results serve as a warning that care must be taken in 
how coarse-graining is applied. As an example, the lysogeny- lysis switch of bacteriophage lambda 
is extremely stable to fluctuations 3 ^ 3 ^, a fact that computational modelling (using macroscopic 
approximations) has thus far been unable satisfactorily to explain 3 ^ -. In such a case, careful 
coarse-graining is crucially important. 

Our results show that under certain biologically relevant conditions fast reactions can be eliminated 
while preserving the correct dynamical characteristics of the system, even when highly sensitive 
fluctuation-driven quantities such as switch flipping rates are considered. This is very encouraging 
for the simulation of more complex reaction networks, and it would be interesting to apply these 
approaches to more complicated genetic switches, and also for other gene regulatory networks where 
dynamical fluctuations are important. We hope that this work will be of use as a "tutorial" in 
designing and implementing coarse-graining schemes, and that it may aid in pointing the way to 
accurate and efficient coarse-grained simulations of a wide variety of interesting and important 
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biochemical networks. 
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appendix A: SOLVING THE OPERATOR BINDING MASTER EQUATION 

In the EO approach, instead of solving the macroscopic rate equation for operator binding, one can 
solve the corresponding chemical Master Equation. However, as the operator states can be present 
only in copy number or 1, the state space is extremely limited, and the solution of the Master 
Equation coincides with the solution for the rate equation - as we demonstrate here. 

The Master Equation for reactions (b) in Table [I] is the following: 
d 

— p(n A2 ,n B2 ) = (Al) 

- p(n A2 ,n B2 )(k on n n A2 + k on n n B2 ) 

- p(n A2 ,n B2 )(k ofl noA2 + k sn B 2 
+ P(n-A 2 - l,nB 2 )K${noA2 + 1) 
+ p(n A2 ,n B2 - l)k oS (noB 2 + 1) 

+ p{n A2 + 1, n B2 )k on (n + 1)0a 2 - 1) 
+ p(n A2 ,n B2 + l)k on (n + l)(n B2 - 1). 

Only three states are possible: (0 = 1,A 2 ,B 2 ), (OA 2 = 1, A 2 -l, B 2 ) and (OB 2 = 1, A 2 , B 2 -l). This 
greatly simplifies Eq. flAll) : 

p(n A2 ,n B2 )k on (n A2 + n A2 ) = (A2) 
p{n A2 -l,n B2 ) k oS +p(n A2 ,n B2 -l) k oS , 

p(n A . 2 ,n B2 -l) = p(n A2 ,n B2 )k on n B2 , 
p(n A2 -l,n B2 ) = p(n A2 ,n B2 )k on n A2 . 
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The solutions of Eq. flA2j) can be easily computed: 

MfX = Pin A2 ,n B2 ) = 1 + ( ^ ) _ 1 1 (wAa + nBa) , (A3) 
(n OA2 )£\ =p(n A2 -l,n B ^ ^"Vy, 



(^)" 1 n B2 

A2 ,b 2 rvw* v l + (^)-Hn A2 +n B2 )- 



appendix B: SOLVING THE DIMERISATION MASTER EQUATION 

Following the approach described in 5 , the Master Equation for the reaction X + X ^ X2, with 
rate constants hi for association and fcb for dissociation, with system volume V, and given a total 
number or monomers+dimers n Xr , where n Xr = n x + 2n X2 , is the following: 



d 

p{n X2 \n XT ) = k h (n X2 + 1) p(n X2 + l\n Xr ) - (Bl) 



h{n Xr - 2n x , 2 )(n XT - 2n x , 2 - 1) 



2V 

h(n Xr - 2n X2 + 2)(n Xr - 2n X2 + 1 



+ hn X2 



p{n X2 \n Xr ) + 



p(n X2 -l\n Xj 



2V 

Eq. ( 1B1I) can be solved numerically in steady state (starting from an initial guess n X2 = 0), to 
obtain the exact probability distribution for the number of dimers n X2 , for a given total number 
n Xj , of momoners+dimers. The probability distribution for the monomer number can be trivially 
obtained from the dimer distribution noting that p{n x \n XT ) = n Xr — 2p{n X2 \n XT ). Eq. (IBlj) is 
solved for a range of values of tl Xt \ results are stored in look-up tables, which are later used to 
compute effective propensities for the coarse-grained simulations. 

appendix C: FORWARD FLUX SAMPLING 

Forward Flux Sampling (FFS)— is a method for sampling spontaneous transitions between 
two regions in phase space A and B, and for computing the rate constant for such transitions. A 
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and B are defined by an order parameter A, such that A < Aa in A and A > Ab in B. A series of 
nonintersecting surfaces A , ■ ■ • , A n are defined in phase space, such that A = Aa and A n = Ab, and 
such that any path from A to B must cross each interface, without reaching Aj + i before Aj. The 
transition rate /cab from A to B is the average flux $A,n of trajectories reaching B from A. This can 
be decomposed in the following way: 

n-1 

k A B = $A,n = $A,oP(A„|Ao) = $A,0 f[ P ( X i+A X i)- ( C1 ) 

i=0 

Here, &a.o is the average flux of trajectories leaving A in the direction of B and P(A„|Ao) is the 
probability that a trajectory that crosses Ao in the direction of B will eventually reach B before 
returning to A. On the right-hand side, P(Xi + i\Xi) is the probability that a trajectory which reaches 
Aj, having come from A, will reach Aj + i before returning to A. In Eq. llClj) . the flux of trajectories 
from A to B is split into the flux across the first interface Ao, multiplied by the probability of getting 
from that interface to B, without returning to A. This last term is then factorized in a product of 
conditional probabilities of reaching the next interface (before returning to A), having arrived at a 
particular interface from A. We note that this does not imply a Markov approximation 28 . 

The flux $a,o is obtained by running a simulation of the system in the "basin of attraction" of 
A and counting how many times the trajectory in phase space crosses Ao coming from A. At the 
same time, one generates a collection of phase space points that correspond to the moments that 
the trajectory reached Ao, moving in the direction of B. This collection of points is then used as 
the starting point for a calculation of P(Ai|Ao). A point from the collection is chosen at random 
and used to initiate a new trajectory, which is continued until either A or Ai is reached. If Ai is 
reached, the trial is designated a "success" . This is repeated many times, generating an estimate for 
^(Ai | A ) (the number of successes divided by the total number of trials), plus a new collection of 
points at Ai that are the end points of the successful trial runs. This collection of points is used to 
initiate trial runs to A2, generating an estimate for P(A2|Ai) and a new collection at A2, etc. FFS also 
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allows sampling of the trajectories corresponding to the transition (the transition path ensemble) by 
tracing back to A paths that eventually arrive in B. Further details on FFS are given in2&2&. The 
use of FFS to compute stationary probability distributions is demonstrated in^Z 
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Reaction Propensity Reaction Propensity 

Dimerisation A + A ^ A 2 fcfTiA^A — l)i fcb«A 2 B + B ^ B 2 fcfriB(nB — 1), fcb«B 2 a 

DNA binding O + A 2 # 0A 2 k on n n A21 k oS n A 2 O + B 2 # OB 2 k on n n B2 , fc ff"OB 2 b 

Production O — > O + A ^prod^o O — > O + B fc P rod«o c 

Production OA 2 — > OA 2 + A fc P rodnoA 2 OB 2 — > OB 2 + B fc P rod«OB 2 d 

Degradation A — > /ztia B — > /ztib e 



Table 1: Reactions and propensity functions for the model genetic switch. 
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Name 


Reaction 


Propensity 


Method 


Coarse-grained 




Definition 












variable 








EO 




0^ A 


fcprod(^0 + n OA 2 )^g 2 


Rate Equation 


A 2 




n k 2 = "A 2 + 


noA 2 






-> B 


fcprod(«0 + "-OB 2 )? E r 




B 2 




n B 2 = "B 2 + 


n OB 2 




A 




^on«A(«A-l), fc off ("-A 2 }f^B 2 














B 


+- B ^ B 2 

A — > W 


Lrip friR — 1) , fcr.fr (tir„ - 

' "Oil ' "D \ ' "r> / ) on \ 02 / a g 














+ 


2A # OA 2 


Kon\n>A 2 ) A ' K oS n OA 2 


— ; 

ricitc rLqiicitioii 






n A = n A + 


2n A2 




o + 


2B ^ 0B 2 


fcon(^B 2 )| E , fcoff"OB 2 




B 




n B = n B + 


2n B2 




(J 


— ► C + A 


fcprod'^O 














u 


— > C + r> 


fcprod^O 














UA2 




fcprod^OA 2 
















— > OBt 4- B 
A — > 


"'prod't'OB 2 

u (n A \? E 

r*\ l(, A/ A 














+ 


2k ^ 0A 2 


I, /„ , \ME l , 
Kon{nA 2 ) A , K gnoA 2 


— ; 

JVlastcr Equation 


A 
I \ 




n A = «a + 


2n A2 




o + 


2B # 0B 2 


fcon("-B 2 )g E , fcoff^OB 2 




B 




n B = n B + 


2n B2 




c 


— > C + A 


fcprod^O 














n 

v y 


— v O 4- R 


fcprod^O 














OA 2 


-> OA 2 + A 


"<piod' t C'A 2 














0B 2 


0B 2 +B 

A^0 
B -> 


fcprod^OB 2 












E0-ED1 




0^ A 


fc P rod(n + noA 2 )f E g 


Rate Equation 


A 


n A 


= ha + 2riA 2 


+ 2n A 2 






-> B 


fcprod("0 + n OB 2 )| E § 




B 


n B 


= n B + 2n B2 


+ 2n OB2 






A^0 


















B -> 














E0-ED2 




0^ A 


feprod<«0 + n OA 2 }^| 


Master Equation 


A 


n~ A 


= n A + 2n A2 


+ 2n A 2 






-> B 


fcprod("-0 + "'OB 2 }jjf! 




B 


n B 


= n B + 2n B2 


+ 2n B 2 






A^0 


















B -> 


M(»B>3f| 













Table 2: Summary of coarse-graining schemes for the original reaction set (Table HJ: eliminating operator binding 
(EO), eliminating dimerisation reactions using the Macroscopic Rate Equation (EDI) or the Master Equation (ED2), 
eliminating both dimerisation and operator binding using the Macroscopic Rate Equation (EO-ED1) or the Master 
Equation (EO-ED2). For each coarse-graining scheme, the coarse-grained reaction set is indicated together with the 
propensity function for each reaction. We also give definitions of the new slow variables for each scheme. 
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Reaction Propensity Reaction Propensity 

A + A^A 2 fcfriA(raA-l), fcb^A 2 B + B ^ B 2 /cfn B (riB-l), fcb«B 2 

O + A 2 OA 2 k on n n A2 , k off n A 2 O + B 2 ^ OB 2 k on n n B2 , k oB n B 2 

Table 3: Reaction scheme for the preliminary simulations to compute the effective propensity functions given in 
Eqs. (HU) and (TTll), for scheme EO-ED2. 



40 





k f = 5 


fc f = 100 


fcf = 5 


fcf = 100 






^on = 5 


fcon = 100 


fcon = 100 


ORN 


5.81 


113 


5.55 


118 


EO 


5.25 


103 


5.08 


70.7 


EDI 


0.18 


0.19 


1.94 


1.94 


ED2 


0.18 


0.19 


1.92 


1.91 


EO-ED1 


0.085 


0.082 


0.081 


0.081 


EO-ED2 


0.083 


0.082 


0.081 


0.084 



Table 4: CPU time (in seconds) required to simulate the system for t s - lm — 10 5 fc pl .Q d , for different parameter sets. 
Simulations were performed on an AMD Athlon 1600+ processor. The dissociation rates were scaled such that the 
equilibrium constants for dimerisation and operator binding were kept constant at — and K^ = l. 
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Pictorial representation of our model switch, corresponding to Table |TJ Two divergently-transcribed 
genes A and B are under the control of a single regulatory binding site O. Proteins A and B both form 

homodimers, each of which can bind to O and block transcription of the other gene 

Typical simulation trajectory for the model switch, with baseline parameters except for fj, which is 
replaced by /i = 0.45fc pro d- The total numbers of A and B molecules fluctuate around two stable states, 
one rich in A and the other rich in B. Transitions between these states are rapid, yet infrequent. . . , 
Probability distribution p(A) as a function of the order parameter A. Inset A zooms in on the left peaks 
and shows how all the methods are able to reproduce the positions and heights of the peaks. Inset B 
shows, on a logarithmic scale, deviations between the different coarse-graining schemes and the original 

network for the region around A = 0. FFS was used to sample p(X) in this region 

Switch flipping rate &ab as a function of the dimcr-DNA association rate fc on , adjusting k g so that the 
equilibrium constant for DNA binding remains unchanged. The switch is more stable when operator 
binding/unbinding is rapid, suggesting that fluctuations in these reactions play an important role in 
the switch flipping. Methods that remove these reactions give a straight line in the figure; among those, 
only methods EO and EO-ED2 are able to capture the asymptotic behavior of the curve for fc on — > oo. 
Method ED2 always gives a good approximation of the rate, while EDI consistently overestimates it 

by about one order of magnitude 

Switch flipping rate /cab as a function of the protein-protein association rate fcf, adjusting fcb so that 
the equilibrium constant for dimerisation remains unchanged. For the full reaction set, the switching 
rate increases with the rate of dimerisation. The EO curve consistently underestimates the rate by 
approximately an order of magnitude. The methods that remove the dimerisation reactions give a 
constant horizontal line. Among them, only methods EO-ED1 and EO give good results for high fcf, 
although for the latter we suspect this arises from a lucky cancellation of errors 
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